Hopfield Net

So far, neural networks for computation are all feedforward structures

Loopy network

Each neuron is a perceptron with +1/-1 output
- Every neuron receives input from every other neuron
- Every neuron outputs signals to every other neuron

At each time each neuron receives a “field” $\sum_{j \neq i} w_{j i} y_{j}+b_{i}$ $\sum_{j \neq i} w_{j i} y_{j} + b_{i}$
- If the sign of the field matches its own sign, it does not respond
- If the sign of the field opposes its own sign, it “flips” to match the sign of the field

If the sign of the field at any neuron opposes its own sign, it “flips” to match the field
- Which will change the field at other nodes
- Which may then flip... and so on...

Filp behavior

Let $y^{-}_{i}$ be the output of the $i$ -th neuron just before it responds to the current field
Let $y_{i}^{+}$ be the output of the $i$ -th neuron just after it responds to the current field
if $y_{i}^{-}=\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$ , then $y_{i}^{+} = -y_{i}^{-}$
- If the sign of the field matches its own sign, it does not flip
- $y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=0$
if $y_{i}^{-}\neq\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$ , then $y_{i}^{+} = -y_{i}^{-}$
- $y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=2 y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$
- This term is always positive!
Every flip of a neuron is guaranteed to locally increase $y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$

Globally

Consider the following sum across all nodes

$\begin{array}{c} D\left(y_{1}, y_{2}, \ldots, y_{N}\right)=\sum_{i} y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) \\\\ =\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i} \end{array}$

Assume $w_{ii} = 0$
For any unit $k$ that “flips” because of the local field

$\Delta D\left(y_{k}\right)=D\left(y_{1}, \ldots, y_{k}^{+}, \ldots, y_{N}\right)-D\left(y_{1}, \ldots, y_{k}^{-}, \ldots, y_{N}\right)$

$\Delta D\left(y_{k}\right)=\left(y_{k}^{+}-y_{k}^{-}\right)\left(\sum_{j \neq k} w_{j k} y_{j}+b_{k}\right)$

This is always positive!
Every flip of a unit results in an increase in $D$

Overall

Flipping a unit will result in an increase (non-decrease) of

$D=\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i}$

$D$ is bounded

$D_{\max }=\sum_{i, j \neq i}\left|w_{i j}\right|+\sum_{i}\left|b_{i}\right|$

The minimum increment of $D$ in a flip is

$\Delta D_{\min }=\min _{i,\{y_{i}, i=1 . \ldots N\}} 2|\sum_{j \neq i} w_{j i} y_{j}+b_{i}|$

Any sequence of flips must converge in a finite number of steps
- Think of this as an infinite deep network where every weights at every layers are identical
- Find the maximum layer!

The Energy of a Hopfield Net

Define the Energy of the network as

$E=-\sum_{i, j \neq i} w_{i j} y_{i} y_{j}-\sum_{i} b_{i} y_{i}$

Just the negative of $D$
The evolution of a Hopfield network constantly decreases its energy
This is analogous to the potential energy of a spin glass(Magnetic diploes)
- The system will evolve until the energy hits a local minimum
We remove bias for better understanding

The network will evolve until it arrives at a local minimum in the energy contour

Content-addressable memory

Each of the minima is a “stored” pattern
- If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern
This is a content addressable memory
- Recall memory content from partial or corrupt values
Also called associative memory
- Evolve and recall pattern by content, not by location

Evolution

The network will evolve until it arrives at a local minimum in the energy contour
We proved that every change in the network will result in decrease in energy
So path to energy minimum is monotonic

For 2-neuron net

Symmetric
- $-\frac{1}{2} \mathbf{y}^{T} \mathbf{W} \mathbf{y}=-\frac{1}{2}(-\mathbf{y})^{T} \mathbf{W}(-\mathbf{y})$
- If $\hat{y}$ is a local minimum, so is $-\hat{y}$

Computational algorithm

Very simple
Updates can be done sequentially, or all at once
Convergence when it deos not chage significantly any more

$E=-\sum_{i} \sum_{j>i} w_{j i} y_{j} y_{i}$

Issues

Store a specific pattern

A network can store multiple patterns
- Every stable point is a stored pattern
- So we could design the net to store multiple patterns
  - Remember that every stored pattern $P$ is actually two stored patterns, $P$ and $-P$
How could the quadrtic function have multiple minimum? (Convex function)
- Input has constrain (belong to $(-1,1)$ )
Hebbian learning: $w_{j i}=y_{j} y_{i}$
Design a stationary pattern
- $\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=y_{i} \quad \forall i$
So
- $\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=\operatorname{sign}\left(\sum_{j \neq i} y_{j} y_{i} y_{j}\right)$
- $\quad=\operatorname{sign}\left(\sum_{j \neq i} y_{j}^{2} y_{i}\right)=\operatorname{sign}\left(y_{i}\right)=y_{i}$
Energy
- $\begin{aligned} E=&-\sum_{i} \sum_{j<i} w_{j i} y_{j} y_{i}=-\sum_{i} \sum_{j<i} y_{i}^{2} y_{j}^{2} \\\\ &=-\sum_{i} \sum_{j<i} 1=-0.5 N(N-1) \end{aligned}$
- This is the lowest possible energy value for the network

Stored pattern has lowest energy
No matter where it begin, it will evolve into yellow pattern(lowest energy)

How many patterns can we store?

To store more than one pattern

$w_{j i}=\sum_{\mathbf{y}_{p} \in\left\{\mathbf{y}_{p}\right\}} y_{i}^{p} y_{j}^{p}$

$\{y_P\}$ is the set of patterns to store
Super/subscript $p$ represents the specific pattern
Hopfield: For a network of neurons can store up to ~ $0.15N$ patterns through Hebbian learning(Provided in PPT)

Orthogonal/ Non-orthogonal patterns

Orthogonal patterns
- Patterns are local minima (stationary and stable)
  - No other local minima exist
  - But patterns perfectly confusable for recall
Non-orthogonal patterns
- Patterns are local minima (stationary and stable)
  - No other local minima exist
    - Actual wells for patterns
  - Patterns may be perfectly recalled! (Note K > 0.14 N)
Two orthogonal 6-bit patterns
- Perfectly stationary and stable
- Several spurious “fake-memory” local minima..

Observations

Many “parasitic” patterns
- Undesired patterns that also become stable or attractors
Patterns that are non-orthogonal easier to remember
- I.e. patterns that are closer are easier to remember than patterns that are farther!!
Seems possible to store K > 0.14N patterns
- i.e. obtain a weight matrix W such that K > 0.14N patterns are stationary
- Possible to make more than 0.14N patterns at-least 1-bit stable

18 Hopfield Network